Method and system for multiplex genetic analysis

ABSTRACT

The present disclosure provides apparatus, systems and method for detecting separately and substantially simultaneously light emissions from a plurality of localized light-emitting analytes. A system according to exemplary embodiments of the present disclosure comprises a sample holder having structures formed thereon for spatially separating and constraining a plurality of light-emitting analytes each having a single nucleic acid molecule or a single nucleic acid polymerizing enzyme, a light source configured to illuminate the sample holder, an optical assembly configured to collect and detect separately and substantially simultaneously light emissions associated with the plurality of light emitting analytes. The system may further include a computer system configured to analyze the light emissions to determine the structures or properties of a target nucleic acid molecule associated with each analyte.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims a priority benefit under 35 U.S.C. § 119(e) fromU.S. Provisional Application No. 60/689,692 filed Jun. 10, 2005, whichis incorporated herein by reference.

FIELD

The present application relates to molecular analysis, and moreparticularly to methods and systems for multiplex genetic analysis ofsingle molecule nucleic acid synthesis.

INTRODUCTION

The information stored in a DNA molecule depends on particular sequencesof nucleotides, which are bases or building blocks of the DNA molecule.DNA sequencing allows the determination of the nucleotide sequence of aparticular DNA segment. A conventional method of DNA sequencing startswith a defined fragment of a DNA molecule as a template. Based on thistemplate, a population of molecules differing in size by one base of aknown composition is generated. The population of molecules is thenfractioned based on size using, for example, acrylamide or agarose gelelectrophoresis of single-stranded DNA molecules. The base at thetruncated end of each of the fractionated molecules is thereafterdetermined to establish the nucleotide sequence.

A sequencing method called dideoxy sequencing was developed by FredSanger. His method is based on DNA synthesis in the presence of dideoxynucleotides (ddNTP), which differ from normal deoxynucleotides (dNTP) inthat they lack a 3′-hydroxyl group so that once a dideoxy nucleotide isincorporated, it will terminate strand synthesis. The procedure fordideoxy sequencing starts with setting up four reactions each in adifferent tube containing the single strand DNA to be sequenced, labeled(tagged) primer, DNA polymerase, normal dNTPs, and a different ddNTP(i.e. for A, T, C, or G). A dideoxy nucleotide will be incorporated,randomly, at each point the corresponding nucleotide occurs in thetemplate strand. Each time a dideoxy nucleotide is incorporated, it willstop further DNA replication. This will generate a set of fragments ofvarious lengths, each fragment corresponding to the point at which thereis a nucleotide complementary to the dideoxy nucleotide. The fragmentsare then separated based on their length by electrophoresis. With thesmaller fragments migrating faster, the sequence can be determined byassociating the base composition with each fragment.

The above technique for DNA sequencing suffer from the disadvantage thatsample preparation is relatively complex in order to ensure that thetubes contain the same DNA molecules or fragments of the same DNAmolecules to be sequenced. This leads to increased costs and thepossibility of error. A simpler method results if molecule-basedinvestigation techniques are used to observe the synthesis of a singleDNA molecule. Because only one molecule is being observed, there is noneed to ensure that all of the surrounding molecules are the same.

Specialized tools for imaging and spectroscopy have been developed tocharacterize nanomaterials and nanomaterials-related phenomenon.Techniques for constructing these tools comprise near-field scanningoptical microscopy (NSOM) and single molecule spectroscopy (SMS). Thesetechniques offer unique capabilities for investigating properties at themolecular level owing to their high spatial resolution, chemicalsensitivity, and their ability to determine dynamical properties such asmolecule binding/unbinding kinetics and the structural dynamics ofpolymers. For example, a sample-scanning confocal fluorescencemicroscope using SMS developed by McNeil et al. has demonstrated spatialresolution of ˜400 nm, and single molecule sensitivity. It uses adetector system having a single-photon avalanche diode and a sensitiveTE-cooled CCD spectrometer, permitting the ability to monitorfluorescence in the range of 400 to 1100 nm at a resolution of 20 nm andthe ability to conduct time-lapse fluorescence spectroscopy with singlemolecule sensitivity.

The single-molecule techniques described above, however, often employfemtoliter-scale observation volumes and require the use of picomolar tonanomolar sample concentrations to ensure that on average only onemolecule will be present in the sample volume. These concentrations arefar lower than those that normally occur in nature. Thus, moleculedynamics that are affected by concentration cannot be suitably testedusing the techniques. To overcome the deficiencies of NSOM and SMStechniques, other developments have been proposed. For example, Leveneet al. describes a device for single molecule analysis employing asample plate, which has 50 nm-diameter holes in a 100 nm thick aluminumfilm on a fused silica coverslip. When the holes are illuminated fromunder the fused silica coverslip, the holes act as zero-mode waveguidesprohibiting the light from going through the aluminum film because thediameter of the holes are much smaller than the wavelength of the light,which is about 400-700 nm. The light, however, does generate anevanescent field that extends about 10 nm into the cavity of eachilluminated hole producing a zeptoliter-scale effective observationvolume near the opening of the hole. See Levene et al., US PatentApplication Publication Number 2003/0174992 A1, which is incorporatedherein by reference.

The small observation volume provided by the zero-mode waveguidesdescribed by Levene, however, raises other challenges spanning fromsample preparation, signal detection, noise or background suppression,data collection and data analysis algorithms. Accordingly, significantfurther developments are needed.

The present teaching in one aspect comprises an affordablehigh-sensitivity and high-throughput system and method forsingle-molecule analysis that performs at a lower cost relative toconventional systems used in sequencing, resequencing, and SNPdetection. These and other features of the present teaching are setforth herein.

SUMMARY

The present disclosure provides apparatus, systems and method foranalyzing a plurality of molecules by detecting separately andsubstantially simultaneously light emissions from a plurality oflocalized light-emitting analytes each including a single one of theplurality of molecules. The detected light emissions, after beingproperly analyzed, can be used to deduce the structure or properties ofeach of the plurality of molecules. In some embodiments, the apparatus,systems and methods can be used for nucleic acid sequencing, nucleicacid resequencing, and/or detection and/or characterization of singlenucleotide polymorphism (SNP analysis) including gene expression.

In various embodiments, the present invention can provide an apparatusfor sequencing a plurality of target nucleic acid molecules including asample holder configured to separate and confine a plurality of sourcepoints each including a single one of the target nucleic acid molecules,a fraction of a nucleic acid molecule, or a nucleic acid polymerizingenzyme molecule, a light source configured to direct excitation lighttoward the sample holder at an angle with respect to a normal of thesample holder, the excitation light illuminating the source points andcausing the source points to fluoresce, at least one detector, and anoptical assembly configured to collect fluorescent signals fromilluminated source points to form images of the source points on the atleast one detector.

In various embodiments, the present invention can provide a method forsequencing a plurality of target nucleic acid molecules, includingsubjecting a plurality of source points of a sample holder to nucleicacid polymerization reactions, wherein the source points each includefluorescence-labeled bases, primers, and at least one nucleic acidpolymerizing enzyme molecule, and wherein the plurality of source pointseach has a single one of the target nucleic acid molecules, directingexcitation light toward the sample holder at an angle with respect to anormal of the sample holder to illuminate the source points and to causethe source points to fluoresce, and collecting fluorescent signals fromthe illuminated source points and focusing the fluorescent signals ontoat least one detector to form images of the source points on the atleast one detector to determine time sequences of base incorporations inthe polymerization reactions.

In various embodiments, the present invention can provide a method forsequencing a plurality of target nucleic acid molecules, includingsubjecting a plurality of source points of a sample holder to nucleicacid polymerization reactions, wherein the source points each includefluorescence-labeled bases, primers, and at least one of the targetnucleic acid molecules, and wherein the plurality of source points eachhas a single nucleic acid polymerizing enzyme molecule, directingexcitation light toward the sample holder at an angle with respect to anormal of the sample holder to illuminate the source points and to causethe source points to fluoresce, and collecting fluorescent signals fromthe illuminated source points and focusing the fluorescent signals ontoat least one detector to form images of the source points on the atleast one detector to determine time sequences of base additions in thepolymerization reactions.

In various embodiments, the present invention can provide a method forsequencing a plurality of target nucleic acid molecules, includingenriching a sample holder with a plurality of source points each havinga single one of the target nucleic acid molecules and/or a singlenucleic acid polymerizing enzyme molecule, subjecting the plurality ofsource points to nucleic acid polymerization reactions, (1) wherein whenthe source points have a single one of the target nucleic acidmolecules, the source points each further include fluorescence-labeledbases, primers, and at least one nucleic acid polymerizing enzymemolecule, (2) wherein when the source points have a single nucleic acidpolymerizing enzyme molecule, the source points each further includefluorescence-labeled bases, primers, and at least one of the targetnucleic acid molecules, and (3) wherein when the source points have asingle one of the target nucleic acid molecules and a single nucleicacid polymerizing enzyme molecule, the source points each furtherinclude fluorescence-labeled bases and primers, directing excitationlight toward the sample holder at an angle with respect to a normal ofthe sample holder to illuminate the source points and to cause thesource points to fluoresce, and collecting fluorescent signals from theilluminated source points and focusing the fluorescent signals onto atleast one detector to form images of the source points on the at leastone detector to determine time sequences of base incorporations in thepolymerization reactions.

A system according to exemplary embodiments of the present disclosurecomprises a sample holder having structures formed thereon for spatiallyseparating and constraining a plurality of light-emitting analytes eachhaving a single one of the plurality of molecules to be analyzed. Inexemplary embodiments, each of the plurality of molecules is a singlenucleic acid molecule, a fraction of the nucleic acid molecule, anoligonucleotide molecule, or a single nucleic acid polymerizing enzyme.The system further comprises a light source configured to illuminate thesample holder, an optical assembly configured to collect and detectseparately and substantially simultaneously light emissions associatedwith the plurality of light emitting analytes. The system may furtherinclude a computer system configured to analyze the light emissions todetermine the structures or properties of a target nucleic acid moleculeassociated with each analyte.

In one exemplary embodiment of the present invention, the light sourceis configured to produce excitation light that is directed toward thesample holder at an angle with respect to a normal of a plane associatedwith the sample holder. In further embodiments, excitation light isdirected toward the sample holder such that total internal reflectionoccurs and the excitation light is recycled multiple times beforeexiting the sample holder.

In one exemplary embodiment of the present invention, the opticalassembly comprises at least one pixilated sensor device such as a chargecoupled device (CCD) detector or CMOS detector configured to detectsubstantially simultaneously light emissions from the multiple localizedlight-emitting analytes. In further embodiments, the optical assembly isconfigured to disperse spectrally the light emitted from the multiplelocalized light-emitting analytes onto the detector(s) so that differentfrequency bands of the emitted light are detected by different areas ofthe detector(s).

These and other features of the present teaching are set forth herein.

BRIEF DESCRIPTION OF THE DRAWINGS

The skilled artisan will understand that the drawings, described below,are for purposes of illustration only, and are not intended to limit thescope of the present teaching in any way.

FIG. 1 is a block diagram of an exemplary embodiment of a highthroughput system for single molecule analysis.

FIG. 2 is a top view of a sample holder in the system.

FIG. 3 is a cross-sectional view of a portion of the sample holder.

FIG. 4 is a flowchart illustrating an exemplary embodiment of a methoduseful for enriching the sample holder with a plurality of spatiallyrestrained source points.

FIG. 5 is a diagram illustrating further embodiments of a method usefulfor enriching the sample holder with a plurality of spatiallyconstrained source points.

FIG. 6 is a 3-dimensional view of a portion of the sample holderaccording to further embodiments of the present teaching.

FIG. 7 is a diagram illustrating a DNA sequencing process along achannel on the sample holder according to exemplary embodiments of thepresent teaching.

FIGS. 8A and 8B are flowcharts illustrating further embodiments of amethod useful for placing a plurality of source points on the sampleholder in FIG. 6.

FIGS. 8C and 8D are diagrams illustrating embodiments of a setup forstretching each of a plurality of oligonucleotide molecules along abottom surface of a channel on the sample holder.

FIG. 9 is a block diagram illustrating exemplary embodiments of anoptical arrangement useful for illuminating the source points placed onthe sample holder.

FIGS. 10A-10E are block diagrams illustrating further embodiments of anoptical arrangement useful for illuminating the source points placed onthe sample holder.

FIG. 11 is top view of a sample holder showing a plurality of sourcepoints.

FIG. 12 is a top view of an image plane in a detector in the system inFIG. 1 according to exemplary embodiments.

FIG. 13 is a diagram illustrating an optical assembly for detectinglight signals from the source points on the sample holder according toexemplary embodiments.

FIGS. 14A-14C are diagrams illustrating further embodiments of anoptical assembly for detecting light signals from the source points onthe sample holder.

FIG. 15A is a block diagram illustrating a frame transfer CCD array in adetector in the system in FIG. 1 according to exemplary embodiments ofthe present teaching.

FIG. 15B is a block diagram illustrating an interline CCD array in thedetector in the system in FIG. 1 according to exemplary embodiments ofthe present teaching.

FIG. 16A are examples of normalized fluorescent spectra corresponding tofour different fluorescent dyes.

FIG. 16B is a block diagram illustrating full spectrum data.

FIG. 16C is a block diagram of the full spectrum data with the mostinformative wavelengths distinguished from less informative wavelengths.

FIG. 17A is a flowchart illustrating a method for reading out CCD dataassociated with a plurality of source points according to exemplaryembodiments.

FIGS. 17B-17C are each a spreadsheet for estimating the throughput ofreading out data from a CCD array using the method in FIG. 17A.

FIG. 18 is a block diagram of a computer system used in the system inFIG. 1 according to exemplary embodiments of the present teaching;

FIG. 19 is a histogram illustrating the number of photons in differentspectral bins detected from a single incorporation event;

FIG. 20 is a flowchart of an exemplary embodiment of a method useful forbase determination according to the present teaching; and

FIG. 21 is a plot of composite data over a plurality of time binsaccording to exemplary embodiments of the present teaching.

DESCRIPTION OF VARIOUS EMBODIMENTS

It is to be understood that both the foregoing summary and the followingdescription of various embodiments are exemplary and explanatory onlyand are not restrictive of the present teachings. In this application,the use of the singular comprises the plural unless specifically statedotherwise. Also, the use of “or” means “and/or” unless stated otherwise.Similarly, “comprise,” “comprises,” “comprising,” “comprise,”“comprises,” and “including” are not intended to be limiting.

Additionally, while certain embodiments are described in detail herein,particularly embodiments suitable for analysis of single moleculenucleic acid synthesis, it is to be understood the apparatus, systemsand methods of the present disclosure may be employed in otherapplications for analysis of single molecules, such as but not limitedto directed resequencing, SNP detection, and gene expression.

Furthermore, the figures in this application are for illustrationpurposes and many of the figures are not to scale with correspondinghardware. Many parts of the features in the figures in this applicationare drawn out of scale purposefully for ease of illustration.

Systems according to some embodiments of the present disclosuregenerally comprise a sample holder configured to hold a plurality oflocalized light-emitting analytes each comprising a single one of aplurality of molecules to be analyzed, a light source configured toilluminate the sample holder, and an optical assembly configured tocollect and detect light emitted from the source points.

FIG. 1 is a block diagram of an exemplary embodiment of a system 100 fordetecting and analyzing light emitted from the plurality oflight-emitting analytes. As shown in FIG. 1, system 100 comprises asample holder 110, a light source 120, and an optical assembly 130.System 100 may further comprise a host computer system 140 (see alsoFIG. 18) configured to analyze optical data detected by optical assembly130. System 100 may also comprise one or more digital signal processors(DSP) or field programmable gate arrays (FPGA) 150 coupled betweenoptical assembly 130 and host computer system 140. DSPs or FPGAs 150 canbe used to execute algorithms for base determination, as explained inmore detail below.

System 100 may optionally comprise an index-matching prism 108. A spacebetween the sample holder 110 and the optical assembly 130 may be filledwith a fluid 104. The utility of the index-matching prism 108 and thefluid 104 is discussed below. Although for reasons discussed below, itmay be advantageous to direct the excitation light from the light source120 to the sample holder 110 at an angle, as shown by the solid line 121in FIG. 1, system 100 is not limited to such use, and the excitationlight can be directed via a dichroic filter 125 toward the sample holderalong a normal N of the sample holder 110, as shown by the dashed linesin FIG. 1.

In exemplary embodiments of the present teaching, sample holder 110 isconfigured to support and confine the plurality of light-emittinganalytes. For ease of discussion, each localized light-emitting analytewill hereafter be referred to as a dye or a “source point”. In variousembodiments, a dye or a source point comprises a single nucleic acidmolecule, a fraction of a nucleic acid molecule, an oligonucleotidemolecule, or a single nucleic acid polymerizing enzyme. The dye orsource point may also comprise one or more other molecules,constituents, or reactants. The emitted light from the complex can beused to deduce the structure or properties of a target nucleic acidmolecule.

In one exemplary embodiment, in applications employing nucleic acidsequencing, each source point is a complex of a single nucleic acidpolymerizing enzyme, a target nucleic acid molecule, and at least oneincorporated or incorporating fluorescence-labeled nucleotide analog.The source point is localized or spatially constrained in at least onedimension that is less than the wavelength of the excitation light. Thefluorescent label on the nucleotide analog emits fluorescent light uponillumination by light source 120. In exemplary embodiments of thepresent teaching, four different nucleotide analogs are labeled withfour different fluorescent dyes each having a unique emission spectrum.The four different fluorescent dyes can also be associated with fourdifferent frequency bands each corresponding to a peak in emissionintensity according to the respective spectrum. The four differentfrequency bands are hereafter referred to as first, second, third, andfourth frequency bands.

Thus, the time sequence of base incorporation can be observed bydetecting fluorescent signals from sequentially incorporated nucleotideanalogs associated with a source point. The fluorescent light signalsfrom different source points on the sample holder 110 are substantiallysimultaneously collected and detected by optical assembly 130 and areanalyzed by computer system 140 to determine the identities of theincorporated nucleic acid molecule in each of the source points. Toreduce or eliminate interference between fluorescent signals associatedwith consecutive incorporation events on a same source point, afterdetection of an incorporation event, fluorescent label on the newlyincorporated nucleotide can be bleached, cleaved or otherwise removedwith a known technique. Photo-cleavable linkers may be utilized tofacilitate efficient and consistent removal of the fluorescent labels.

In some embodiments, the source points are localized or spatiallyconstrained at different locations on sample holder 110 by immobilizingthe single nucleic acid molecule or the single nucleic acid polymerizingenzyme in each source point at one of the locations. This allowsseparate and substantially simultaneous detection of fluorescentemission from the plurality of source points. A conventional method orone of the methods discussed below can be used to immobilize the enzymesor the template nucleic acid molecules.

FIG. 2 is a block diagram of a top-down view of sample holder 110according to exemplary embodiments. As shown in FIGS. 1 and 2, inexemplary embodiments, sample holder 110 comprises a substrate 112 madeof a material transparent to the excitation light from light source 120and to the fluorescent emissions from the source points. A metallic film114 is formed on a top surface of substrate 112. Depending on specificapplications, for reasons discussed below, metallic film 114 may extendto the side surfaces and edge portions of a bottom surface of thesubstrate 112, as shown in FIG. 1. Sample holder 110 may furthercomprise a sealer 115 and a cover 116 for evaporation control. A space118 is formed between the cover 116 and the substrate 112, which spaceserves as a sample chamber for holding a sample fluid that supplies atleast one of the constituents or reactants in each source point. Invarious embodiments, in applications of nucleic acid sequencing, thesample fluid comprises a fluorophore solution of different types offluorescent-labeled nucleotides. Sample holder 110 may further comprisea fill hole 230 for filling the sample chamber 118 with the sample fluidand a drain hole 240 for draining the sample fluid from the samplechamber. Fill hole 230 and drain hole 240 are preferably located neartwo opposite corners of sample holder 110, as shown in FIG. 2, for morecomplete draining and washing away of sample fluid.

As shown in FIG. 2, sample holder 110 is configured to hold a pluralityof spatially separated and constrained source points 210 in a field ofview 220 of the optical assembly 130. The spatial separation andconfinement of the source points 210 help in one aspect to detect lightsignals from the source points 210 separately and substantiallysimultaneously. Although FIG. 2 shows that the source points 210 on thesample holder 110 are arranged in an array having two rows and a numberof columns, such arrangement is not necessary as long as the sourcepoints are sufficiently spaced from each other so that the light signalsfrom them can be effectively resolved by optical assembly 130 When thesource points are arranged in an array, the array may be perfect,meaning each array element site has an immobilized functional sourcepoint, or imperfect, meaning at least one array element site is missinga source point, has a source point that is not functional, or hasmultiple source points that are too close together to allow resolutionby the optical assembly 130.

In various embodiments, the metallic film 114 on the top surface of thesubstrate 112 has etched patterns forming cavities for housing theplurality of source points and separating the plurality of source pointsto allow resolution by the optical assembly 130. In some embodiments,zero-mode waveguides, such as those described in Patent ApplicationNumber US 2003/0174992 by Levene et al, which is incorporated herein byreference, are formed in metallic film 114, as shown in across-sectional view in FIG. 3. Zero-mode waveguides are known in theart and can be created using a variety of materials and methods. As aspecific, non-limiting example, substrate 112 is a fused silicasubstrate, metallic film 114 is an aluminum film formed on the fusedsilica substrate, and an array of holes 310 are formed by masking andplasma etching the aluminum film to create holes 310 in the aluminumfilm. Each hole 310 has a diameter that is substantially smaller than awavelength of the excitation light from light source 120 and a depththat is sufficient to block transmission of the excitation light throughthe hole. Thus, each hole 310 acts as a zero-mode waveguide for theexcitation light from light source 120, allowing the excitation light,which comes to the waveguides from the substrate side, to penetrate onlya bottom portion 312 of the hole 310. At the same time, the zero-modewaveguides also serves to block light emitted or scattered from thesample fluid on the sample holder 110 except emissions coming from anylight emitting agents immobilized in the bottom portions 312 of thewaveguides or diffusing past the bottom portions 312 of the waveguides.

Thus, in some embodiments, to allow the detection and analysis of lightemitted from the source points 210, each source point 210 is immobilizedin the bottom portion 312 of a zero-mode waveguide 310, so that lightemitted from the source point can escape the hole 310, pass throughsubstrate 112 and be collected by optical assembly 130. Preferably, onlyone source point should be present in the bottom portion 310 of a hole310 because it would be difficult for the optical assembly todistinguish the emitted light from more than one source point in asingle hole 310 considering the size of the hole. Therefore, in theexemplary embodiment, holes 310 that either do not have any source pointimmobilized in the bottom portion 312 or have more than one source pointimmobilized in the bottom portion 312 do not contribute to the analysisand are considered as empty sites in an array of source points 210.

For ease of discussion, the description hereafter will be illustrated inthe context of nucleic acid sequencing, while the methods, systems andapparatus of the present teaching can be applied to other types ofmolecular analysis. Methods of immobilizing molecules involved in agenetic assay in waveguides 310 are described in detail in US PatentApplication Number US 2003/0044781 by Korlach et al., which isincorporated herein by reference. Using the methods described byKorlach, some of the array of holes 310 can each contain a single DNAmolecule or enzyme immobilized in the bottom portion 312, while a largepercentage of the holes may contain none or multiple molecules in eachof them and are thus useless in the analysis.

FIG. 4 illustrates a flowchart of one embodiment of a method 400 forenriching the sample holder 110 with source points. Method 400 increasesthe efficiency and throughput of system 100 by maximizing the percentageof holes 310 that have in each of them a single source point in thebottom portion 312. As shown in FIG. 4, according to exemplaryembodiments of the present teaching, method 400 comprises the followingsteps: step 410 in which uncovered portions of the substrate 112 arecoated with streptavidin, step 420 in which a dilute solution includinga plurality of molecules each being a nucleic acid molecule or a nucleicacid polymerizing enzyme is applied to the waveguides 310. Each of theplurality of molecules has a photoactivatable biotin attached to it. Theconcentration of the plurality of molecules in the solution is selectedto be lower than the optimal Poisson distribution so that when thesolution is applied to the waveguides 310 on the sample holder 110, mostof the waveguides 310 would be populated by zero number of the moleculesand that statistically few of the waveguides 310 would be occupied bymore than one of the molecules.

Still referring to FIG. 4, method 400 further comprises step 430 inwhich a first group of waveguides are identified as each being occupiedby at least one nucleic acid molecule or a nucleic acid polymerizingenzyme. The first group of waveguides can be identified by using, forexample, a simplified sequencing assay. Method 400 further comprisesstep 440 in which light is shown on each of the first group ofwaveguides. The light activates the biotin attachment in thosewaveguides and thus immobilizes the molecules using thebiotin-streptavidin bound. Step 440 is followed by an optional step 450in which the solution is removed from the sample holder by washing orinactivation, leaving only those molecules bonded to the bottom of thewaveguides. Step 450 is followed by an optional step 460 in whichanother dilute solution of the biotin attached molecules are applied tothe waveguides 310 on sample holder 110. Method 400 further comprisesstep 470 in which an additional group of waveguides are identified aseach being occupied by at least one nucleic acid molecule or a nucleicacid polymerizing enzyme. In one embodiment, the additional group ofwaveguides do not overlap with previously identified group(s) ofwaveguides. Method 400 further comprises step 480 in which light isshown on each of the additional group of waveguides and thus immobilizesthe nucleic acid molecules or enzymes in the additional group ofwaveguides. Method 400 then repeats steps 450-480 until most of thewaveguides are populated by bound molecules. Note that steps 450 and 460are optional because, instead of carrying out steps 450 and 460, one cansimply wait for a period of time to allow more of the plurality ofmolecules to diffuse into some of the waveguides.

Optionally, after populating the waveguides with polymerase molecules, aprimer is attached to each polymerase molecule by a flexible linker.Attaching the primer to the polymerase molecule helps the analysisbecause the DNA template would be tethered and not float away, allowingsubsequent synthesis to occur on the same template. In one aspect, thisbenefits the analysis by increasing read lengths and throughput. Longerread lengths help to simplify any fragment assembly problem.

In some embodiments, a method for enriching the sample holder involvesthe use of nanobeads. As shown in FIG. 5, the sample fluid comprisenanobeads 510, and the enzyme or nucleic acid molecule 520 is attachedto a nanobead 510 by a cleavable linker 515, in a manner that most ofthe nanobeads in the sample fluid each have at most one enzyme ornucleic acid molecule attached. The nanobeads are sized such that onlyone nanobead is likely to fit in a waveguide 310. The enzyme or nucleicacid molecule 520 has a photoactivatable linker 525 that allowsattachment of the composite including the nanobead, the nucleic acidmolecule or enzyme, and the cleavable linker 515 to an attachment site530 at the bottom of a waveguide 310. The attachment is activated byshining light from the bottom of substrate 112. Since only thosenanobeads each having an enzyme or nucleic acid molecule with thephotoactivatable linker can bind to the substrate 312, and the presenceof a nanobead in a waveguide 310 excludes other enzyme or nucleic acidmolecules from diffusing into the same waveguide, there is no need todetermine which waveguides are occupied by the nanobeads before shininglight on the waveguides to activate the linkers 525. The shining oflight can be repeated later when more nanobeads with enzyme or nucleicacid molecules attached thereon diffuse into other waveguides. Thus, thenanobeads can be used to increase the number of waveguides each having asingle nucleic acid molecule or enzyme attached therein. After binding,the nanobeads are removed from the enzyme or nucleic acid molecules by,for example, chemically cleaving the linkers 515 or dissolving thenanobeads.

In alternative embodiments of the present teaching, sample holder 110comprises slots or channels to facilitate confining the plurality ofsource points 210 on the sample holder 110. FIG. 6 illustrates a3-dimensional view of a plurality of channels 610 formed in metallicfilm 114 on substrate 112. As a non-limiting example, channels 610 areformed in an aluminum film over a fused silica substrate. Each channel610 has a width w that is smaller than a wavelength associated withlight source 120. In exemplary embodiments of the present teaching,light from light source 120 is linearly polarized and the polarizationdirection is oriented with the electric field vector in the light wavealong the width direction of the channels. Thus, only a bottom portion612 in each channel 610 would be illuminated by the excitation lightfrom light source 120, as shown in FIG. 6. Channels 610 can be formedusing conventional techniques, such as conventional semiconductorprocessing or integrated circuit (IC) fabrication techniques.

Sample holder 110 with channels 610 formed thereon has multipleadvantages over a sample holder with zero-mode waveguide holes 310formed thereon. Because the fluorescent emissions are largelyunpolarized, they would not be attenuated when they try to exit thechannels 610 as much as when they try to exit holes 310 ofsub-wavelength dimension. So, more emitted light from sample holder 110can be collected and detected by optical assembly 130, resulting inincreased signal to noise ratio. In addition, each channel 610 can housea larger DNA template molecule if the DNA molecule is oriented parallelto the channel, as shown in a top-down view of the channel in FIG. 7.This way, the polymerase can migrate down the template for a much longerdistance without exiting the illuminated volume 612. The DNA moleculecan be tethered so that it can remain in one location while thepolymerase, having a finite processivity, may fall off the template andbe replaced by another polymerase. This can lead to longer read lengthsand thus significantly simplified assembly processes, especially duringdenovo sequencing. Although FIG. 7 shows that channel 610 is closed atboth ends 701 and 702, the channels 610 on sample holder can be open oneither or both ends by extending all the way to the edge(s) of thesample holder, as shown in FIG. 8C below.

The polymerase or template molecules can be attached to sample holder110 using conventional photoactivatable linkers. In exemplaryembodiments of the present teaching, channels 610 may house more thanone polymerase or template molecules attached to sample holder 110 byflexible linkers that are placed in the channels 610. The moleculesshould be attached to the channels 610 in a resolvable fashion, meaningthat they are sufficiently spaced from each other to allow efficientresolution of the emissions therefrom by the optical assembly 130.

FIG. 8A illustrates a method 800 for enriching the sample holder 110with source points by attaching the polymerase or template molecules inchannels 610 in a resolvable fashion according to exemplary embodimentsof the present teaching. As shown in FIG. 8A, method 800 comprises step810 in which exposed portions of the top surface of the substrate 112are coated with a photoactivatable linking substance such asPHOTOACTIVATABLE BIOTIN™ (PAB), step 820 in which a solution of enzymeor template molecules is applied to sample holder 110, and step 830 inwhich the PAB is exposed to a pattern of light shone from the bottomsurface of the substrate 112. The pattern of light may be created byinterference or refraction using grating or other conventionaltechniques and has interleaving lighted and dark areas in each channel610. The distance between two neighboring lighted areas is selectedbased on the resolution of the optical assembly 130 so that emissionsfrom the two lighted areas can be separately and substantiallysimultaneously detected by the optical assembly 130. The PAB in thelighted area will be activated causing the template molecules to beattached to the sample holder in those areas, while the PAB in the darkareas will not be activated so no template molecules will be attached tothose areas. Method 800 further comprises step 840 in which sampleholder 110 is washed to remove unbound template molecules, leaving thebound enzyme or nucleic acid molecules in each channel 610 and allowingthe formation of optically resolvable source points 210 on sample holder110.

Referring to FIG. 8B, instead of enzyme or template molecules, asolution of oligonucleotide (oligo) molecules can be applied to thesubstrate in step 820, and additional steps can be used to stretch anoligo along the bottom surface of each of a plurality of channels 610 onthe sample holder 110. As shown in FIG. 8B, to attach an oligo to thebottom surface of a channel 610, method 800 further comprises step 822in which an end of the oligo is attached at one end 701 of the channelusing a chemical linker, such as a biotin-streptavidin or PNA-PNAhybridization binding, where one part of the linker is attached to theend of the oligo and the other part of the linker is bound to thesubstrate. Method 800 further comprises step 824 in which the oligo isstretched along the channel and held to the bottom surface of thechannel. Many conventional techniques of stretching DNA molecules can beused in step 824, including but not limited to hydrodynamic,electrostatic, and magnetic manipulations. For example, the oligomolecules can be stretched using dielectrophoresis, in which the oligois stretched by a direct current (DC) electric field or a high-frequency(e.g., 1 MHz) and high-density (e.g., 1 MV/m) alternating-current (AC)field applied between two electrodes.

In one embodiment, as shown in FIG. 8C, the sample holder 110 can beplaced in a container 880 for holding the fluid containing theoligonucleotides, an electrode 871 made of, for example, Indium Ti, isplaced above the end 701 of the channel, and another electrode 872 madeof the same or different conductive material as electrode 871 is placedin the container 880 below the bottom surface 612 of the channel 610 andnear the other end 702 of the channel. A field is provided between thetwo electrodes 871 and 872 such that the oligo 710 is stretched alongthe channel 610 and held by the field along the bottom of the channel.Depending on the relative lengths of the oligo 710 and the channel 610,the oligo 710 may extend beyond the channel toward the electrode 872, ifthe channel is open at the end 702. Electrode 872 may also be placedunder the substrate 112, as shown in FIG. 8C.

With the electric field still on, step 830 is performed to furtherattach the oligo 710 so that the field can be removed later, preventingthe field from interfering with sequencing operation afterwards. Theoligo in each of the plurality of channels may be stretched and attachedsimultaneously using the same or different electrodes.

After binding the enzyme, oligonucleotide, or target nucleic acidmolecules to the sample holder 110, the sample holder 110 is placed insystem 100. A fluorophore solution comprising fluorescence labelednucleotide analogs is applied to the sample holder 110. In exemplaryembodiments of the present teaching, the speed of chemistry ofincorporation can be altered by changing the temperature, viscosity, andconcentration of the fluorophore solution, and/or by modifying the basechemistry. For example, adding molecules such as dye molecules to thefluorophore solution has been found to slow the rate of baseincorporation. In addition, the sample holder 110 in system 100 shouldideally be under temperature control to insure consistency. Thetemperature could be changed during detection. For example, thetemperature of the sample holder 110 can be reduced to slow down or stopincorporation activities until the rest of system 100 is ready tocollect signals from the sample holder 110, as discussed below.

To observe light emitted from the source points, excitation light fromlight source 120 is directed towards the substrate side of the sampleholder 110, and signals from fluorescing nucleotides are collected byoptical assembly 130. The confinement of the source points on sampleholder 110 helps to distinguish the fluorescent signals emitted byincorporated nucleotides in the source points 210 from those emitted byfreely diffusing fluorescent ligands.

As explained in more detail below, multiple methods can be used inexemplary embodiments for base determination. For example, color, signalstrength, bleaching life, fluorescent lifetime, and incorporation timecan be combined to gain better base discrimination. The consistency ofthese measurements can be used to predict a confidence value for thebase determination. Confidence values can be used to sort or weight thedata and to discard data of low quality, thus allowing automatedconsensus generation from large amount of data. This can improve thequality of the consensus as well as providing a measure of confidence.

Prior art systems, such as the one described by Levene et al., 2003 in“Zero-Mode Waveguides for Single-Molecule Analysis at HighConcentrations,” SCIENCE, Vol. 299:682, which article is incorporatedherein by reference, uses a confocal fluorescent set up. The confocalfluorescent set up has multiple shortcomings. First, the aluminum filmreflects the excitation light directly back into the collection optics.The reflected excitation light is very intense compared to thefluorescent signals from incorporated nucleotides. To attenuate thereflected light, multiple filters are used, and each filter attenuates asignificant percentage of the already weak fluorescent signals.Furthermore, the excitation light in the set up of Korlach and Levene,supra, can also excite fluorescence in the optics. This unwantedfluorescence could pass through the filters, increasing the backgroundnoise.

In exemplary embodiments of the present teachings, excitation light fromlight source 120 is directed to the source points in sample holder 110in an off-axis manner such that reflected excitation light, or asignificant amount of it, could not enter the optical assembly 130. Insome embodiments, where prism or wedge 108 is not provided, a light ray901 from light source 120 is directed to sample holder 110 at an angle θwith respect to a normal direction N of substrate 112, as shown in FIG.9. As the substrate 112 is made of a transparent material, such as fusedsilica, a relatively small first portion of the incident light 901 isreflected by the bottom surface 910 of substrate 112 and comes towardthe optical assembly 103 as a first reflected light ray 912, while asecond portion of the incident light enters the substrate at a differentangle θ′ with respect to the normal N as a refracted light ray 914.Angle θ′ depends on angle θ and the refractive index n of the substrate112. The refracted light ray 914 impinges on the metallic film at theangle θ′ and a relatively large portion of the refracted light ray 914is likely to be reflected by the metallic film 114 and comes toward thebottom surface 910 of substrate 112 as light ray 916. Light ray 916 whencrossing the bottom surface 910 is refracted again and comes off thebottom surface 910 at the angle θ as light ray 918. With the off-axisarrangement and a proper selection of the angle θ, little of the lightray 918 should enter the optical assembly 130 placed under sample holder110, as shown in FIG. 9.

To eliminate or reduce reflection at the bottom surface 910 of substrate112, 0 can be chosen to be within 10° of the Brewster's angle θ_(B).Furthermore, to achieve zero or near zero reflection at the bottomsurface 910 of substrate 112, the light from the light source 120 islinearly polarized with the E vector in the light parallel to the planeof incidence, which is the plane containing the incident ray 901 and thenormal N of substrate 112. According to Brewster's Law, when the angleof incidence θ is equal to or near the Brewster's angle θ_(B), thetransmittance, i.e., the ratio of transmitted power in ray 914 to theincident power in ray 901 across bottom surface 910 of substrate 112should be one or near to one and the reflected power in ray 912 fromsurface 910 should be zero or near zero. Brewster's angle θ_(B) is givenby:

$\theta_{B} = {{\tan^{- 1}\left( \frac{n_{2}}{n_{1}} \right)} = {\tan^{- 1}\sqrt{\frac{ɛ_{2}}{ɛ_{1}}}}}$

where n₁ and n₂ are the refractive indices of the respective media,i.e., air and substrate 112, and ε₁ and ε₂ are their respective electricpermittivity values.

In some embodiments, system 100 is configured to achieve total internalreflection so that a significant amount of the excitation light fromlight source 120 is recycled within substrate 112, as shown in FIGS.10A-10C. Total internal reflection is a phenomenon that light incidentupon a boundary from a denser medium to a less dense medium iscompletely reflected off the boundary. Since the light ray 916 reflectedfrom metallic film 114 has to travel through the substrate 112 towardthe boundary 1010 between the substrate 112 and air, it is possible toachieve total reflection such that the light ray 916 is recycled in thesubstrate 112.

FIG. 10A illustrates an optical arrangement for achieving total internalreflection according to exemplary embodiments of the present teaching.As shown in FIG. 10A, prism 108 is ideally made of the same material assubstrate 112 and is in direct or fluidic contact with the substrate. Inexemplary embodiments, prism 108 is fused with substrate 112 at a firstsurface 1012 of the prism. Prism 108 has a second surface 1014 disposedat an angle α with respect to the first surface 1012. In someembodiments, angle α is selected to be equal to the incident angle θ oflight ray 901 from light source 120. Thus, light ray 901 from lightsource 120 is directed toward the second surface 1012 of prism 108 alonga normal direction of the second surface 1014 of the prism 108. While asmall portion of light ray 901 may be reflected by surface 1014, therest of light ray 901 enters substrate 112 without any change indirection because prism 108 and substrate 112 are ideally made of a samematerial and are fused together or optically coupled with each otherwith a fluid. A large portion of light ray 901 is reflected by metallicfilm 114 and comes off from the metallic film 114 as a light ray 1112.Light ray 1112 impinges on the boundary 1010 between substrate 112 atangle equal to the angle θ with respect to the normal N of the substrate112 from the inside of substrate 112.

In exemplary embodiments of the present teaching, θ is selected to beequal or larger than a critical angle θ, such that light ray 1112 istotally reflected from boundary 1010 and comes back towards metallicfilm 114 as light ray 1114. The above reflection from the metallic film114 and the total reflection at the boundary 1010 are repeated for lightray 1114 and its derivatives, which are the reflected portion of lightray 1114 and reflected portion thereof and so on, as shown in

FIG. 10A. In exemplary embodiments, metallic film 114 is formed toextend to the side surfaces 1020 and in some embodiments to the edgeportions 1030 of the bottom surface 910 of the substrate 112 so thatlight rays reflected from the metallic film 114 have little chance ofescaping substrate 112 at the side surfaces 1020 but are recycled andused repeatedly as excitation light for the source points 210, as showninFIG. 10A. According to Snell's Law, the critical angle θ_(τ) isdetermined by:

${\theta_{\tau} = {\sin^{- 1}\left( \frac{n_{1}}{n_{2}} \right)}},$

where n₁ and n₂ are the refractive indices of the respective media,i.e., air and substrate 112, respectively.

In further embodiments, collection efficiency of optical assembly can beincreased by using a fluid 104 having a refractive index between that ofthe air and that of the transparent material used to construct thesubstrate 112. For example, when substrate 112 is made of fused silicahaving a refractive index of about 1.46, water can be used as the fluid104 because it has a refractive index of 1.33, which is between therefractive index of air (˜1) and that of fused silica (˜1.46). The fluid104 is placed between the substrate 112 and the optical assembly 130. Inthe embodiments employing the fluid 104, the critical angle isdetermined by:

${\theta_{\tau} = {\sin^{- 1}\left( \frac{n_{f}}{n_{2}} \right)}},$

where n_(f) is the refractive index of the fluid. The critical angleθ_(τ) is therefore increased by employing the fluid. With the increasein the critical angle θ_(τ), the collection efficiency is increasedbecause more emitted light from the source points is able to escapethrough the bottom surface 910 of the substrate 112 without goingthrough total internal reflection, and can therefore be collected by theoptical assembly 130. The angle α of the prism 108 and the incidentangle θ of the excitation light may be adjusted accordingly to allowtotal internal reflection of the excitation light to still occur in thepresence of the fluid 104.

In another exemplary embodiment, as shown in FIG. 10B, total internalreflection is facilitated by directing the excitation light toward aside 1060 of the substrate 112. Excitation light 1070 from the lightsource 120 is directed to a side surface 1062 at an angle θ₁ withrespect to a normal N′ of the side surface 1062. A refracted portion1072 of the excitation light 1070 leaves the side surface 1062 at anangle θ₂ that is dependent on the angle θ₁ according to the Snell's Law,and impinges on the aluminum film 114 on top of the substrate 112 at anangle θ₃=α−θ₂, where α is the angle between the side surface 1062 andthe bottom surface 1010 of the substrate 112. After reflection from thealuminum film, a reflected portion 1074 of the excitation light impingeson the surface 1010 at an angle θ₄ that is equal to the angle θ₃. θ₂ andα can be selected such that θ₃ or θ₄ is equal to or larger than thecritical angle θ_(τ) for total reflection at the bottom surface 1010 ofthe substrate 112. For example, a in this case can be selected to be ator near 90° in order to create a large θ₃ or θ₄ angle. Thus, most of therefracted portion 1072 of the excitation light 1070 can be repeated usedto illuminate the source points on the sample holder 110 before exitingthe substrate 112.

In another exemplary embodiment, as shown in FIG. 10C, the side 1060 hasa beveled surface 1062 forming an angle α with the bottom surface 910 ofthe substrate 112, where a is less than 90° and larger or equal to thecritical angle θ_(τ). Excitation light 1070 from the light source 120 isdirected to the beveled surface 1062 along a normal N′ of the beveledsurface 1062. A refracted portion 1072 of the excitation light 1070leaves the beveled surface 1062 at an angle θ₂ that is dependent on theangle θ₁ according to the Snell's Law, and impinges on the aluminum film114 on top of the substrate 112 at an angle θ₃=α. After reflection fromthe aluminum film, a reflected portion 1074 of the excitation lightimpinges on the surface 1010 at an angle θ₄ that is equal to the angleθ₃. With a being equal or larger than the critical angle θ_(τ), θ₄ isalso equal or larger than θ_(τ) and total reflection occurs at thebottom surface 1010 of the substrate 112.

In another exemplary embodiment, as shown in FIG. 10D, the side 1060 hasa beveled surface 1062 forming an angle α with the bottom surface 910 ofthe substrate 112, and excitation light 1070 from the light source 120is directed to the beveled surface 1062 at an angle θ₁ with respect to anormal N′ of the beveled surface 1062. A refracted portion 1072 of theexcitation light 1070 leaves the beveled surface 1062 at an angle θ₂that is dependent on the angle θ₁ according to the Snell's Law, andimpinges on the aluminum film 114 on top of the substrate 112 at anangle θ₃=θ₂+α. After reflection from the aluminum film, a reflectedportion 1074 of the excitation light impinges on the surface 1010 at anangle θ₄ that is equal to the angle θ₃. θ₂ and a can be selected suchthat θ₃ or θ₄ is equal to or larger than the critical angle θ_(τ) fortotal reflection at the bottom surface 1010 of the substrate 112. Thus,most of the refracted portion 1072 of the excitation light 1070 can berepeated used to illuminate the source points on the sample holder 110before exiting the substrate 112.

In another exemplary embodiment, the excitation light 901 from the lightsource 120 is coupled into the substrate 112 through a grism, which is aprism and grating combination, or grating 1080 formed on or attached toa portion of the bottom surface 1010 of the substrate 112, as shown inFIG. 10E. With the use of the grism or grating 1080, the excitationlight 901 can enter the substrate at an angle θ with respect to thenormal N of the substrate that is equal or larger than the criticalangle θ_(τ), and after being reflected from the aluminum film 114, wouldimpinge on the bottom surface 1010 of the substrate 112 at the angle θand be totally reflected from the bottom surface 1010 back into thesubstrate. Thus, the excitation light is recycled within the substrate,as shown in FIG. 10C.

The arrangements in FIGS. 9 through 10E are advantageous overconventional systems in part because, by placing the detector(s) and theoptical assembly 130 under sample holder 110, fluorescent signals can becollected through the bottom surface 910 of the sample holder 110without the interference of reflected light from metallic film 114.Furthermore, since the excitation light and the fluorescent signalsentering the optical assembly 130 do not have a common light path, thereis no need of heavy filtering to separate the excitation light enteringthe substrate 112 from the fluorescent signals exiting the substrate 112through the bottom surface 910.

In various embodiments, optical assembly 130 comprises at least onepixilated or multi-element detector configured to sense light signalslanded thereon and a set of optical components configured to directlight emissions from the source points toward the multi-elementdetector(s). FIG. 11 illustrates a top view of a portion of the sampleholder 110 showing a plurality of source points 210. In exemplaryembodiments, as shown in FIG. 12, the pixilated or multi-elementdetector comprises a plurality of addressable light-sensing elements1210 organized in an imaging plane 1220, such as the x-y plane. The setof optical components is configured to direct light emissions fromdifferent source points toward different areas 1230 of the imaging plane1220 so that light emissions from different source points 210 can beseparately and substantially simultaneously detected.

Thus, as shown in FIG. 12, light emissions from each source point 210form an image of the source point in an area 1230 on the imaging plane1220. In exemplary embodiments, the set of optical components furthercomprises a light-dispersing setup configured to separate lightemissions from the multiple source points 210 into multiple spectralcomponents so that the detected light from each source point is spreadout spectrally along the y axis and images 1230 represent spectrallyresolved images of the source points 210, as shown in FIG. 12. When thelight-dispersing setup is provided, enough separation betweenneighboring source points 210 on the sample holder 110 is provided toinsure that the spectrally resolved images 1230 of the source points donot overlap with each other. In addition, sufficient gap g along they-direction between an image 1230 of a source point and an image 1230 ofa neighbor source point are provided to prevent overlap of dataassociated with the two source points due to cutoff filter tolerances.

The position of the images 1230 can be determined by a spatialcalibration to associate each source point on the sample holder with anarea 1230 on the image plane 1220. The calibration can be done by usinga dye solution or a light source that is not blocked by system filters.Such calibration, however, may not be required if there is no need tocorrespond the images 1230 with the source points 210. In addition,tolerance should be allowed to insure that there is sufficientseparation d between the areas 1230 and the edges of the image plane1220, and the separation should be controlled to allow detection of allof the source points 210 on the image plane 1220. As a non-limitingexample, the buffer zone d between a side 1232 of an areas 1230 facingan edge 1222 of the image plane 1220 is no more than 8 pixels wide.

Although FIG. 12 shows that the source point images 1230 each having awidth ω of 2 pixels, a pitch between two neighboring images in thex-direction being 4 pixels, a gap g between two adjacent rows of imagesbeing 4 pixels, and a buffer distance d of the images 1230 from eachedge of the image plane 1220 being 8 pixels, these numbers are shown asexamples and can be different in different applications or areadjustable to suit different applications. Moreover, although

FIG. 12 shows only two rows of source points 210 and two rows of images1230 corresponding to the source points, in practice, there may be moreor less rows of source points or images. Also, the source points 210 donot have to be arranged in rows and can be spread out on the sampleholder in any order or even randomly as long as they are sufficientlyseparated so that their images 1230 do not overlap on the image plane1220.

In exemplary embodiments, the optical assembly 130 is similar to the onein the optical system disclosed in U.S. Pat. No. 6,690,467 B1 by Reel,which is incorporated herein by reference. As shown in FIG. 13, as anon-limiting example, the optical assembly comprises a collection lensassembly 1310, a reimaging lens assembly 1320, and at least one CCDdetector 1330 as the pixilated detector. The collection lens assembly1310 comprises at least one collection lens configured to collect lightemissions from the source points 210. The reimaging assembly 1320comprises at least one reimaging lens configured to focus the collectedlight emissions from different source points into different areas 1230of the imaging plane 1220 of the detector 1330.

The use of the collection lens assembly 1310 may also provides asubstantially collimated region between the collection lens assembly1310 and the reimaging lens assembly 1320, which is suitable forinsertion of a variety of optical devices such as a an aperture 1340, alight-dispersing assembly 1350, and/or a laser line filter 1360. Inexemplary embodiments, the light-dispersing assembly 1350 comprises atleast one grating, prism, or grism configured to spread spectrally raysof light that pass through it. For example, a transmission gratingdeflects rays of light that strike thereon at an angle roughlyproportional to the wavelength of the light. Thus, the collimated lightemissions from the source points 210, after going through thetransmission gratings, are dispersed spectrally. With the spectraldispersion, a first light ray of a first wavelength and a second lightray of a second wavelength coming from a same source point 210 shouldarrive at the reimaging lens assembly 1320 at different angles withrespect to an optical axis of the reimaging lens assembly 1320 and thusbe focused onto different locations 1234 and 1236 of the area 1230corresponding to the source point, as shown in FIG. 12. Locations 1234and 1236 are spaced apart from each other along the y-axis because ofthe spectral dispersion.

Instead of prism, grating, or grism in the light dispersing assembly1350, dichoic or bandpass filters can be used to separate the spectralcomponents in the fluorescent signals from each source point. FIG. 14Aillustrates other embodiments of optical assembly 130. As shown in FIG.14A, optical assembly 130 comprises a collection assembly 1310, animaging assembly comprising imaging lenses 1332-1, 1332-2, 1332-3, and1332-4 disposed at 90° angles with respect to each other, a plurality ofCCD detectors 1330, and a light dispersing assembly comprising dichroicor bandpass filters D1, D2, and D3 placed at 90° angles with respect toeach other. Each dichroic or bandpass filter is configured to allowpassage of one of the first, second, and third bands of fluorescentsignals, respectively, and to reflect all other frequencies. Theplurality of CCD detectors 1330 comprises CCD detectors 1330-1, 1330-2,1330-3, and 1330-4. CCD detector 1330-1 is placed behind dichroic filterD1 to collect the first band of fluorescent signals, CCD detector 1330-2is placed behind dichroic filter D2 to collect the second band offluorescent signals, CCD detector 1330-3 is placed behind dichroicfilter D3 to collect the third band of fluorescent signals, and CCDdetector 1330-4 is placed in front of dichroic filter D3 to collect thesignals reflected therefrom, which should comprise the fourth band offluorescent signals. Other filters (not shown) can be placed before CCDdetectors 1330, respectively, for improved frequency selection.

Alternatively, a dichroic or bandpass filter can be configured toreflect the first, second, third, or fourth band of fluorescent signals,and to allow passage of all other frequencies. It is also possible tocombine bandpass, notch, lowpass and highpass filters in any combinationthat permits appropriate separation of the emission wavelengths. FIG.14B illustrates still other embodiments of optical assembly 130. Asshown in FIG. 14B, optical assembly 130 comprises a collection lensassembly 1310, a reimaging assembly comprising reimaging lenses 1320-1,1320-2, 1320-3 and 1320-4, and CCD detectors 1330-1, 1330-2, 1330-3, and1330-4 each behind a respective one of the reimaging lenses 1320-1,1320-2, 1320-3 and 1320-4. Optical assembly 130 further comprises alight dispersing assembly comprising dichroic or bandpass filters DF1,DF2, DF3 and DF4 placed in a row under collection assembly 1310 and eachat an angle γ to an optical axis (shown by the dashed line) of thecollection assembly 1310. Dichroic or bandpass filters DF1, DF2, DF3,and DF4 are each placed in front of a respective one of the reimaginglenses 1320-1, 1320-2, 1320-3 and 1320-4 and are configured to reflectthe first, second, third and fourth bands of fluorescent signals,respectively, while allowing passage of signals of other frequencies.Other filters (not shown) can be placed before CCD detectors 1330,respectively, for improved frequency selection.

Imaging lenses 1320-1, 1320-2, 1320-3 and 1320-4 can be separate lensesor sections of a single lens, CCD detectors 1330-1, 1330-2, 1330-3, and1330-4 in FIG. 14B can be separate CCD detectors or sections of a singleCCD detector. Although FIG. 14B shows that the dichroic or bandpassfilters DF1, DF2, DF3 and DF4 are at a roughly 45° angle with respect tothe optical axis of the collection assembly 1310, such placement is notnecessary and the angle γ can be larger or smaller than 45°. FIG. 14Cillustrates an exemplary configuration of the optical assembly 130 whenthe angle γ is close to 90° so that the collimated light emissions froma source point would impinge on the dichroic or bandpass filters DF1,DF2, DF3 and DF4 at a small incident angle β and be imaged by thereimaging lens assembly 1320 onto the CCD detector(s) 1330.

The CCD assembly 1330 comprises at least one charge-coupled device (CCD)array, such as a regular CCD array, a complimentarymetal-oxide-semiconductor (CMOS) array, an electron-multiplying CCD(EMCCD) array, an intensified CCD (ICCD) array, or an electron-bombardedCCD (EBCCD) array. A CCD array is advantageous over other multi-elementdetectors, such as an array of avalanche photodiode (APD) baseddetectors or photomultiplier tube (PMT) based detectors, because thenumber of elements in a CCD array is much higher, as the size of a CCDpixel in the CCD array can be as small as 3 μm or even smaller.Therefore, signals from different source points can be differentiated bydetecting them using different groups of elements in the CCD array, asdiscussed above. A CCD array can be much less costly than an APD or PMTarray.

To amplify the low light signals from fluorescing labels on theincorporated bases above background noises in CCD arrays, ahigh-sensitivity CCD-based device such as EMCCD, ICCD, or EB-CCD, isused in exemplary embodiments. Due to fast base incorporation rates ofDNA molecules, in addition to sensitivity, the speed of reading data outfrom a CCD detector is also important because it is associated with theability to capture event data and to readout the data out over a shortperiod of time to allow the next event to be observed. Through carefuldesign of a readout scheme, a CCD array can be made to be fast enough toresolve fluorescent emissions from two consecutive incorporation eventsassociated with a same source point. Moreover, a CCD with multipleoutputs or taps can be used to increase the CCD readout speed. Forexample, a CCD with 4 taps can allow a 4-times increase in readoutspeed, which allows images of more source points to be read forincreased throughput.

To further improve the readout speed, a frame transfer CCD (FTCCD) array1500, as illustrated in FIG. 15A, is used in detector 1330 according toexemplary embodiments of the present teaching. The FTCCD array 1500allows data readout operations to be performed concurrently with datacollection operations. As shown in FIG. 15A, CCD array 1500 comprises adark area 1510, an image area 1520, a masked storage area 1530, and ahorizontal register 1540. The image area 1520 of CCD array 1500 is wherelight signals are detected and is constructed as a two-dimensional arrayof light-sensitive elements or pixels. Storage area 1530 comprises anarray of storage elements covered with an opaque mask to providetemporary storage for an image frame transferred from the image area.The signal that is accumulated in each pixel in image area 1520 is readout by a process of parallel transfer in the negative y-direction shownin the figure, whereby charges in each horizontal row within image area1520 and storage area 1530 are transferred to the next row and so forthuntil ultimately they reach the horizontal register 1540, which is aserial readout register that allows charges to be transferred in thex-direction to an output node (not shown), from which they are read out.While data from storage array 1530 is read, image area 1520 is availableto collect a next round of light signals.

Dark area 1510 is a region of excess pixels. Because these pixels arenot illuminated, they do not have to be cleared during each readout.Usually, the combination of dark area 1510 and image areas 1520 mapsdirectly onto the storage area 1530. In one embodiment, image area 1520occupies a small fraction (e.g. 1/10) of the combination so that sourcepoint data can be read out at, for example, 10 times the normal framerate. In exemplary embodiments, CCD array 1500 is kept cool at about 80°C. below zero so that minimal dark current charges are generated. Incertain embodiments, the dark area is eliminated when CCD 1500 is custombuilt to have just the right amount of rows in the image area 1520.

In some embodiments, an interline CCD or a combined interline and frametransfer CCD may be employed. FIG. 15B illustrates an interline CCDarray 1550 in CCD detector 1330 according to exemplary embodiments ofthe present teaching. As shown in FIG. 15B, interline CCD array 1550comprises separate image regions 1560 and CCD storage regions 1570. CCDstorage regions 1570 are protected with a mask structure and positionedalongside respective ones of image regions 1560 such that CCD storageregions 1570 and image regions 1560 together form an alternatingparallel array. Image regions 1560 may comprise circuit elements such asphotodiodes for capturing the images of the source points while CCDstorage regions 1570 shift previously acquired images in a parallelfashion towards a horizontal register 1590. The horizontal register 1590then sequentially shifts image information from each CCD storage regionto an output amplifier or other processing circuits (not shown) as aserial data stream. The readout process is repeated until data from allof the CCD storage regions 1570 are transferred to the output amplifier.

Where an actual two-dimensional image is desired from the CCD, the imagedata in a digital format is reconstructed to yield the final image.Where the data is to be used for non-pictorial or non-imagingapplications, the relevant pixel data may be identified and processedaccording to its intended purpose. One advantage of the interline CCDsis their ability to operate without a shutter or synchronized strobe,allowing for an increase in device speed and faster frame rates. Aninterline CCD array can be used to eliminate blurring or image smear,which is a common problem with frame-transfer CCDs, by effectively doinghorizontal shifts directly from the image regions to the respective onesof the storage regions.

In exemplary embodiments, readout speed is further improved by limitingthe number of source points to be imaged on the CCD so that the numberof data rows to be read are minimized. The number of data rows to beread may also be minimized by binning vertically (in the y-direction),especially when the source points 210 and thus the images of the sourcepoints 1230 are in an array so that the positions of the images can befairly accurately predicted, as shown in FIG. 12. With the fluorescentsignals dispersed spectrally over a range of pixels along they-direction, the pixels can be binned in the y-direction to captureinformative wavelength groupings for base determination. For example, asshown in FIG. 12, each source point image 1230 may comprise fourinterested wavelength groups C1, C2, C3, and C4 interleaved withuninterested wavelength groups U1, U2, and U3. Data in each of theinterested wavelength groups can be binned, while data in theuninterested wavelength groups can be cleared without being read, asdiscussed below. Binning can also be done in the x-direction for furtherincrease in readout speed because shifting is often faster than reading.With a CCD array, binning can be done on-chip in a conventional mannerthat does not introduce noise.

The CCD array in detector 1330 may also be made to allow clearing of thehorizontal register 1540 or 1590. This can speed up readouts if desireddata is separated by rows of unneeded pixels. FIG. 16A illustratesexamples of four normalized fluorescent spectra associated with the fourdifferent fluorescent labels used to label respective ones of fourdifferent nucleotides. The four spectra have peak regions 1610, 1620,1630, and 1640 corresponding to respective ones of the first, second,third and fourth frequency bands. Only four fluorescent frequency bandsneed to be collected by detector 1330. Data can be over-determined,however, to comprise more than the spectrum of data associated with anincorporation event. With the light dispersing setup shown in FIG. 13, acontinuous portion of a spectrum of data is spread along they-direction, as shown in FIG. 16B. FIG. 16C illustrates theover-determined data with the most informative wavelengths correspondingto the four frequency bands in gray and the less informative wavelengthranges shown in white. Data rows in the less informative wavelengthrange between two gray bands is referred to as a band gap bg, which maynot be needed and can be cleared without being read out for increasedreadout speed. The width and position of each informative frequency bandin FIG. 16C can be optimized for best signal to noise (S/N) ratio aftermulticomponenting, as explained in more detail below.

While the frequency bands in FIGS. 12, 16B, and 16C are shown to spreadalong the y-direction, this is not essential and orienting the frequencybands in a different direction on the CCD may be advantageous in somecases.

FIG. 17A is a flow chart of a method 1700 for reading out image dataassociated with a plurality of source points according to exemplaryembodiments of the present teaching. As a non-limiting example, method1700 is described in the context of a frame transfer EMCCD array of512×512 pixels in the image area that allows clearing of the horizontalregister. Nevertheless, the method can be readily adapted to other typesof CCD array. As shown in FIG. 17A, method 1700 comprises step 1710 inwhich the pixels in the image area are shifted down by a frame having,for example, 128 rows. So, 128 rows of pixels are shifted from the imagearea into the masked storage area. These pixels are not read out for anumber of (e.g., 4) frames (512/128=4). Pixels from the dark area willbe shifted down to collect the next images, but the provision of a darkarea is not necessary because the CCD is kept cool (−80° C.) so minimaldark current charge is generated. Method 1700 further comprises step1720 in which a first number of (e.g., 8) buffer rows in a frame isshifted down to the register and cleared. The number 8 is arbitrarilychosen. This number should correspond to the distance d shown in FIG.12. Method 1700 further comprises step 1730 in which the pixelsassociated with the first frequency band is vertically binned andshifted. As a non-limiting example, 4 rows of pixels are associated witha frequency band, and these rows are binned and read out as one row.Method 1700 further comprises step 1740 in which at least some of therows in the band gap bg before the next frequency band are cleared.Steps 1730 and step 1740 are thereafter repeated for each of the otherfrequency bands. Method 1700 further comprises step 1750 in which atleast some of the buffer rows in the gap g between two rows of images1230 are shifted and cleared. Steps 1730 through 1750 are repeated foreach row of source points when the source points are arranged in anarray.

FIG. 17B is a spreadsheet for estimating the throughput for system 100using a frame transfer EMCCD as detector 1330 according to exemplaryembodiments. As shown in FIG. 17B, for a frame transfer CCD having512×512 pixels in the image area, a normal read out time of 1×10⁻⁷second, a vertical shift time of 4×10⁻⁷ second, and a horizontalregister clear time of 5×10⁻⁶ second, using method 1700, the time toread out emission data from 4 rows of source points is about 0.00096seconds, resulting in a readout speed higher than 1000 Hz, which is muchfaster than the normal readout speed of 30 Hz for the CCD. FIG. 17C is aspreadsheet for estimating the throughput for system 100 using a1004×1002 EMCCD as detector 1330 according to exemplary embodiments. Asshown in FIG. 17C, the time for reading out emission data from 4 rows ofsource points in this case is about 0.00113 seconds, resulting in areadout speed close to 1000 Hz.

In exemplary embodiments of the present teaching, optical data collectedby detector 1330 is sent to computer system 140 and optionally oradditionally DSP or FPGA 150 for base determination. FIG. 18 is a blockdiagram of computer system 140 according to exemplary embodiments of thepresent teaching. As shown in FIG. 18, computer system 140 comprises acentral processing unit (CPU) 1810, a memory unit 1820, a data inputport 1830, and a user interface 1840, interconnected by one or morebuses 1850. Memory unit 1820 preferably stores operating system software1822 and other software programs including a program 1824 for basedetermination. Memory unit 1820 further comprises a data storage unit1826 for storing optical data collected by detector 1330 and sent tocomputer system 140 through data input port 1830. Program 1824 comprisescoded instructions, which, when executed by CPU 1810, cause computersystem 140 to carry out methods for detecting light emissions frommultiple source points as described above and/or methods for basedetermination based on the detected data according to exemplaryembodiments of the present teaching, as explained in more detail below.

With the illumination of excitation light, a labeled and incorporatednucleotide should fluoresce by emitting photons from an associatedsource point. The spectrum of the photons collected at detector 130 fromthis single incorporation event should be a collection of photons withdifferent energies or frequencies. When the number of collected photonsis large, the spectrum should resemble the normal dye spectrumcorresponding to the fluorescent dye used to label the incorporatednucleotide. The spectrum will vary, however, due to the small number ofphotons that are typically collected by detector 1330 from the singleincorporation event in each collection time period.

For example, a fluorescing dye may emit 10,000 photons over a 10 microsecond period, and about 4% of the 10,000 photons may be detected bydetector 1330 in ten time bins each corresponding to, for example, a onemicro second period. Thus, roughly 40 photons may be collected in eachtime bin. Plotted spectrally over 10 spectral bins, the 40 expectedphotons might spread out like the histogram shown in FIG. 19. Because ofthe small number of photons, the distribution in FIG. 19 may not matchthe normal dye spectrum corresponding to the incorporated nucleotide,and the mismatch may lead to a chance of incorrect base determination.

In exemplary embodiments, to avoid a base determination problem causedby a small number of photons from a single incorporation event, thepresent teaching comprises a method 2000 for base determinationillustrated by the flowchart in FIG. 20. As shown in FIG. 20, method2000 comprises step 2010 in which data from all spectral bins collectedin each of a plurality of consecutive time bins are copied and combinedto form composite light data. Method 2000 further comprises step 2020 inwhich the composite data from the plurality of consecutive time bins areused to determine an incorporation time interval T.

FIG. 21 is a plot of the composite data over a plurality of 24 time binsshowing the number of photons detected in each time bin. As shown inFIG. 21, during time bins 1, 2, 3, 4, and 5, and time bins 16, 17, 18,and 19, the composite data indicate small numbers of photons coming froma background of diffusion events, trial events, and substratefluorescence. In other time bins, the numbers of photons aresignificantly larger, resulting in large rises 2110 and 2120 above thebackground noises. Rises 2110 and 2120 indicate incorporation events.The incorporation time T for the incorporation event corresponding torise 2110 can be determined by measuring the width of rise 2110, asshown in FIG. 21.

Since most of the photons detected during the incorporation timeinterval T are from a single incorporation event, for each color bin,data associated with the same spectral bin but collected in differenttime bins in the incorporation time interval can be combined, resultingin increased data points for the spectral bin. The increase in thenumber of data points leads to an improved multicomponenting process,which is used to convert color data to dye composition. Thus, method2000 further comprises step 2030 in which data associated with eachspectral bin or frequency band of interest but collected in differenttime bins in the incorporation time interval T are combined, and step2040 in which the combined data is used in a conventionalmulticomponenting process to determine a dominant dye, which is used todetermine the base being incorporated. Method 2000 further comprisesstep 2050 in which the residuals of the multicomponenting process isused to determine a confidence level.

Method 2000 for improving the signal to noise ratio by combining datafrom multiple time bins may be coded as a computer program and executedby computer system 140. Alternatively, since the same algorithm inmethod 2000 is executed a large number of times, hardware solutions suchas field program gate arrays (FPGA) or digital signal processors (DSP)150 and the like can be used to reduce the computation load and datastream size. The FPGAs or DSPs 150 could be integrated in detector(s)1330, between detector 1330 and computer system 140, as shown in FIG. 1,or installed into computer system 140.

The foregoing descriptions of specific embodiments of the presentteaching have been presented for purposes of illustration anddescription. They are not intended to be exhaustive or to limit theteaching to the precise forms disclosed, and obviously manymodifications and variations are possible in light of the aboveteaching. The embodiments were chosen and described in order to bestexplain the principles of the teaching and its practical application, tothereby enable others skilled in the art to best use the teaching andvarious embodiments with various modifications as are suited to theparticular use contemplated. It is intended that the scope of theteaching be defined by the claims appended hereto and their equivalents.

1.-60. (canceled)
 61. An apparatus for sequencing a plurality of targetnucleic acid molecules, comprising: a sample holder configured toseparate and confine a plurality of source points each comprising asingle one of the target nucleic acid molecules, a fraction of a nucleicacid molecule, or a nucleic acid polymerizing enzyme molecule; a lightsource configured to direct excitation light toward the sample holder atan angle with respect to a normal of the sample holder, the excitationlight illuminating the source points and causing the source points tofluoresce; at least one detector; and an optical assembly configured tocollect fluorescent signals from illuminated source points to formimages of the source points on the at least one detector.
 62. Theapparatus of claim 61 wherein the at least one detector has multiplepixel elements and the optical assembly is configured to form spatiallyresolved images of the source points on the at least one detector. 63.The apparatus of claim 62 wherein the angle is within 10° of aBrewster's angle associated with the sample holder.
 64. The apparatus ofclaim 63 wherein the excitation light is linearly polarized with anelectric field vector parallel to a plane of incidence upon the sampleholder.
 65. The apparatus of claim 61 wherein the sample holdercomprises a transparent substrate and the excitation light enters thetransparent substrate at an angle selected to cause total internalreflection when a reflected portion of the excitation light impingesupon an internal surface of the transparent substrate.
 66. The apparatusof claim 65 wherein the sample holder further comprises a transparentmember in direct or fluidic contact with the transparent substrate andthe light source is configured to direct the excitation light toward thesample holder such that the excitation light enters the transparentsubstrate through the transparent member.
 67. The apparatus of claim 65wherein the sample holder further comprises a prism, grism or gratingformed on or attached to a portion of a bottom surface of thetransparent substrate, and wherein the excitation light is coupled intothe transparent substrate through the prism, grism or grating.
 68. Theapparatus of claim 65 wherein the excitation light is directed toward aside of the transparent substrate.
 69. The apparatus of claim 68 whereinthe side has a beveled surface forming an angle with a bottom surface ofthe transparent substrate, the bottom surface being opposite to a topsurface for supporting the plurality of source points.
 70. The apparatusof claim 62 wherein the at least one detector is a CCD-based detectorselected from a group consisting of EMCCD, ICCD, and EBCCD.
 71. Theapparatus of claim 62, wherein the optical assembly comprises at leastone spectrum dispersing device and each image of a source point on theat least one detector is spectrally resolved.
 72. The apparatus of claim71 wherein the spectrum-dispersing device is a grating, prism, or grism.73. The apparatus of claim 72 wherein the optical assembly comprises aplurality of dichroic, highpass, lowpass, notch, or bandpass filtersconfigured to direct fluorescent light of different ranges ofwavelengths from each source point to different pixel elements of the atleast one detector or to different ones of the at least one detector.74. The apparatus of claim 61 wherein the sample holder comprises ametallic film over a transparent substrate, the metallic film havingpatterns configured to spatially confine and separate the plurality ofsource points.
 75. The apparatus of claim 74 wherein the patterns formchannels in the metallic film, each channel having a width that issmaller than a wavelength of the excitation light and a lengthsufficiently large to allow an oligonucleotide molecule to be stretchedalong the channel.
 76. The apparatus of claim 75 wherein the lightsource is configured to direct linearly polarized excitation lighttoward the sample holder, and wherein an electric field vector in theexcitation light is oriented along a width direction of the channels.